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Abstract. We consider the regression model with errors-in- variables where we observe n i.i.d. 
copies of (Y, Z) satisfying Y — f(X) +£, Z = X + <re, involving independent and unobserved 
random variables X, £, e. The density g of X is unknown, whereas the density of as is completely 
known. Using the observations (Y^, Zi), i = 1, • • • , n, we propose an estimator of the regression 
function /, built as the ratio of two penalized minimum contrast estimators of i = fg and g, 
without any prior knowledge on their smoothness. We prove that its L2-risk on a compact set 
is bounded by the sum of the two L2 (R)-risks of the estimators of I and g, and give the rate 
of convergence of such estimators for various smoothness classes for £ and g, when the errors e 
are either ordinary smooth or super smooth. The resulting rate is optimal in a minimax sense 
in all cases where lower bounds are available. 
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1. Introduction 

We consider that we observe n independent and identically distributed (i.i.d.) copies of (Y, Z) 
satisfying the following errors-in-variables regression model 



involving independent and unobserved, random variables X, £, e and an unknown regression 
function /. The unobserved AYs, have common unknown density denoted by g. The errors 
£j's have common known density f e , and a is the known noise level. We assume moreover 
that all random variables have finite variance. Our aim is to estimate the regression function 
/ on a compact set denoted by A, by using the observations (Yj, Zi) for i — 1, . . . , n, without 
any prior knowledge, neither on the smoothness of / nor on the smoothness of the density g. 
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In nonparametric errors-in-variables regression models, two factors determine the estimation 
accuracy of the regression function: first, the smoothness of the function / to be estimated, 
and second the smoothness of the errors density f £ . As in the deconvolution framework, the 
worst rates of convergence are obtained for the smoother errors density f £ . In this context, 
two classes of errors are considered: first the so called ordinary smooth errors with polynomial 
decay of their Fourier transform and second, the super smooth errors with Fourier transform 
having an exponential decay. 

Many papers deal with parametric or semi-parametric estimation in errors in variables mod- 
els, but we only mention here previous known results in the general nonparametric case. In this 
context most of the proposed estimators are some Nadaraya- Watson kernel type estimators, 
constructed as the ratio of two deconvolution kernel type estimators, see e.g. Fan et al. (1991), 
Fan and Masry (1992), Fan and Truong (1993), Masry (1993), Truong (1991), Ioannides and 
Alevizos (1997). One assumption usually done in all those works, is that the regularity of the 
regression function / and the regularity of the density g of the design are equal. In partic- 
ular, when the regression function / and the density g admit /cth-order derivatives, Fan and 
Truong (1993) give upper and lower bounds of the minimax risk for quadratic pointwise risk 
and for L p -risk on compact sets for ordinary and super smooth errors e. 

In a slightly different way, Koo and Lee (1998) propose an estimation method based on B- 
spline, when the errors are ordinary smooth. This method also relates to estimation of the 
regression function as a ratio of two estimators. 

To our knowledge, all previous papers consider that the regression function and the density 
g belong to the same smoothness class and that this common class is known. 

We propose here an estimation procedure of /, that does not require any prior knowledge 
on the regularity of the unknown functions / and g. Our estimation procedure is based on the 
classical idea that the regression function / at point x can be written as the ratio 

f (x) = e(y\x = x) = 5yhA^y)dy = (MM 5 

g(x) g{x) 

with f x ,Y the joint density of (X, Y). Hence / is estimated by a ratio of an adaptive estimator 
£ of £ = fg and of an adaptive estimator g of g, both of them being built by minimization 
of penalized contrast functions. The contrasts are determined by projection methods and the 
penalizations give an automatic choice of the relevant projection spaces. 

We give upper bounds on the L 2 -risk on a compact set for the regression function / as well 
as for the L2(M)-risk of the density g when the errors are either ordinary or super smooth. 
We show in particular that the L,2-risk on a compact set of our estimator / of / is bounded 
by the sum of the risks of £ and g. The rate of convergence of / is thus given by the slower 
rate between the rate of the adaptive estimation of g and the rate of the adaptive estimation 
of £ = fg. The resulting estimator automatically reaches the minimax rates in standard cases 
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where lower bounds are available. The other cases are intensively discussed. In other words, our 
procedure provides an adaptive estimator, in the sense that its construction does not require 
any prior knowledge on the smoothness of / nor g, which seems often optimal. 

The paper is organized as follows. In Section 2, we describe the estimators. Section 3 is 
devoted to the presentation of the upper bounds for the resulting L 2 -risks with some discussions 
about the optimality in the minimax sense of the estimators. All proofs and technical lemmas 
are gathered in Section 4. 

2. Description of the estimators 

For u and v in L 2 (R), u* is the Fourier transform of u with u*(x) = J e ttx u(t)dt, u * v 
is the convolution product, u * v(x) = f u(y)v(x — y)dy, and < u,v >= f u(x)v(x)dx with 
zz = \z\ 2 . The quantities ||w||i, |M| 2 , IMloo and ||w||oo,k denote \\u\\i = J\u(x)\dx, ||w|| 2 = 
/ \u(x)\ 2 dx, ||u||oo = sup xeR \u(x)\, ||u||oo,K = ^PxeK \ u ( x )\- 

Subsequently we assume that f e G L 2 (R), /* G L 2 (R) with f*(x) ^ for all i£l 

2.1. Projection spaces. Consider <p(x) = sin(7ra;)/(7ra;), and <f m j(x) = y/D m (p(D m x — j). 
Here, we take D m = m and m G M. n — {1, • • • , m n }, but when D m = 2 m , the basis {(p m ,j}jez 
is known as the Shannon basis. It is well known (see for instance Meyer (1990), p. 22), that 
{<Pm,j}jez is an orthonormal basis of the space S rn of square integrable functions having a 
Fourier transform with compact support contained in [— 7T-D m , vrD m ] , that is 

S m = Vect{y? m3 , j G Z} = {/ G L 2 (R), with supp(/*) contained in [-nD m , nD m ]}. 

Since the orthogonal projection of g and t on S m , g m and £ m , g m = J2j eIi (im,j(g)<^m,j and 
?m = J2jez a m,j(^)Vm,j with a m j(g) =< (p m ,j,g >, and a m j(£) =< ip m ,j,£ >, involve infinite 
sums, we consider in practice, the truncated spaces Sm defined as 

S^=Vect{<p mJ M<kn} 

where k n is an integer to be chosen later. The family {<Pm,j}\j\<k n is an orthonormal basis of 
Sm\ and the orthogonal projection of g and £ on denoted by g^m and £m\ are given by 

Qrn = ^2\j\<k n a m,j{g) l Pm,j and 6m = X)|j|<fc„ a m,j(^) ( Pm,j ■ 

2.2. Construction of the minimum contrast estimators. For r G R and d > 0, we denote 
by = sign(r) min(|r|, d), and thus define the trimmed estimator of / by 

(2.1) fm e ,m g = {£m e / fjrhg)^ n \ 

with a n being suitably chosen, rhg and m g minimizing the L 2 (R) risks of £ me the projection 
estimator on a space S^, and of g mg the projection estimator on a space S^, defined as 
follows. 
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The estimator of £, is defined by 

(2.2) £ m = arg min 7„^(t), 

t£zS m 

with 7^ defined, for t e Sm , by 

(2.3) lni£ (t) = \\t\\ 2 - 1n x YjXi<( Z i)) with = (^y^i-^/m-x), 

i=i 

that is £ m = J2\j\< kn a m ,j{£)^m,j with a mJ (£) = rT 1 Y? i=1 W Virhj {Z t ). 
By using Parseval and inverse Fourier formulas, we get that 

E(nu*(zo) = ec/^kczo) = « * A, /*> = ^(f:t*/t e , (fg)*) = ^(t*, (fg)*) = (M>. 

Therefore, we find that E(7 n /(t)) = — 2(£,t) = \\t — £\\l — which is minimal when 
t = £. This shows that j n ,t(t) suits well for the estimation of £ = fg. 

By using the estimation procedure described in Comte et al. (2005a), the estimator of g on 
is defined by g m = T,\j\<k n h m,j(g)^m,j with a mJ (g) = n' 1 ]T™ =1 u^^i), that is 

(2.4) g m = arg min 7„, 9 (t) 

with 7„ i9 defined, for t G 5*m^ by 7 n , ff (t) = — 2n _1 J^ILi u t(Zi)> with « 4 defined in (|2.3j) . 

Remark 2.1. The use of avoids the problems that may occur when g m2 takes small values. 

2.3. Construction of the minimum penalized contrast estimators. In order to construct 
the minimum penalized contrast estimators, and especially to define the penalty functions, we 
need to precise the behavior of /*, described as follows. We assume that, for all x in R, 

(Ax) n (x 2 + iy a / 2 exp{-f3\x\ p } < \f*(x)\ < k' (x 2 + l)~ a ' 2 exp{-(3\x\ p }. 

Only the left-hand side of jAxD is required to define the penalty function and for upper bounds. 
The right-hand side is needed when we consider lower bounds and the question of optimality 
in a minimax sense. When p = 0, a has to be such that a > 1/2 . When p = in JAi| ), the 
errors are usually called "ordinary smooth" errors, and "super smooth" errors when p > 0. The 
standard examples are the following: Gaussian or Cauchy distributions are super smooth of 
order (a = 0, p = 2) and (a = 0, p = 1) respectively, and the double exponential distribution 
is ordinary smooth (p = 0) of order a = 2. 

By convention, we set /3 — when p = and we assume that (3 > when p > 0. In the same 
way, if cr = 0, the JQ's are directly observed without noise and we set (3 = a = p = 0. 

Under the assumption JAiD , the regression function / is estimated by / defined as 

(2.5) f = (£/g)M, 
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where £ is the adaptive estimator defined by 

(2.6) £ = lfo t with m t = arg min j n/ (£ m ) + pen e (m) 

g is the adaptive estimator defined as in Comte et al. (2005a), by 

(2.7) g = g-rn with m g = arg min h n , g {g m ) + pen (m 

where M. n ,t and M. n , g are some restrictions of A4 n given below, and where pen^ and pen 9 are 
data driven penalty functions given by 

(2.8) pen £ (m) = k'(Ai + /i 2 )[l + m 2 (Y)]T(m)/n, pen g (m) = k(Ai + p 1 )T(m)/n, 
with 

1 n 

(2.9) m 2 (Y) = - Y Y 2 , and f(m) = jD 2a+max(i-p,mm((i +P )/2,i)) exp {2f3a p (7rD m Y}. 



n . 
i=i 



The constants Ai,/ii and p, 2 are some known constants, only depending on f £ and a (assumed 
to be known), to be defined later (see ()3.4|) . (j3.8|) and ()3.9|) ). and k and k! are some numerical 
constants. 

Remark 2.2. First note that the penalty functions in (j2.8|) have the same form with different 
constants. More precisely, in both cases, the penalties are of order D^ +l ~ p exp(2j3<T p (7iD m ) p ) if 
< p < 1/3, Drn +{1+P)/2 exp(2f3a p (TTD m ) p ) if 1/3 < p < 1 and of order D^ +1 exp(2(3a p (TrD m )P) 
if p > 1. 

Second, the constants involve k and k', universal numerical constants, as well as constants 
Z^i) related to the known errors density f e . Any constant greater than any well chosen 
constant also suits for theoretical results. In practice, such constants are usually calibrated by 
some intensive simulation studies. We refer to Comte et al. (2005a, 2005b) for further details 
on penalty calibration as well as for details on the implementation of such estimators in density 
deconvolution problems. 

3. Rates of convergence and adaptivity 



3.1. Assumptions. We consider Model (|l.lj) under ) and the following additional assump- 
tions. 

(A 2 ) £ G L 2 (R) and £ G C = j</> such that J x 2 (j) 2 (x)dx < n c < ooj , 

(A 3 ) / G Tg = {4> such that sup \<fi(x)\ < Koo,g < °°}) where G is the support of g. 

(A 4 ) g G L 2 (M) and g G Q = {(f), density, such that J x 2 (f) 2 (x)dx < Kg < oo}. 
(A 5 ) There exist go,g± positive constants such that for all x G A, g < g(x) < g\. 
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Note that we do not assume that g is compactly supported but only that / is bounded on 
the support of g. It follows that if g is compactly supported then / has to be bounded on a 
compact set. But if g has R as support then the regression function has to be bounded on R. 
We estimate / only on a compact set denoted by A. Hence, the assumption JA 5 D implies that 
A C G and therefore under QA 3 D and jA 5 D , / is bounded on A. The assumptions jA 3 D and 



L4J imply that JA 2 P holds, with kc = ^ g k G- 
Classically, the slowest rate of convergence for estimating / and g are obtained for super 
smooth errors density. In particular, when f £ is the Gaussian density the minimax rate of 
convergence obtained by Fan and Truong (1993) when / and g have the same Holderian type 
regularity is of order a power of ln(n). Nevertheless, those rates can be improved by some 
additional regularity conditions on / and g described as follows. 

r+00 

(Ri) 5 a , r , B (Ci) = 6 L 2 (l) : such that / \tp*{x)\ 2 {x 2 + l) a exp{2B\x\ r }dx < d}, 



for a, r, B, C\ some nonnegative real numbers. The smoothness class in JRiD is classically con- 
sidered in nonparametric estimation, especially in deconvolution. When r = 0, this corresponds 
to Sobolev spaces of order a. The densities belonging to <S a ,r,B(Ci) with r > 0, B > are infin- 
itely many times different iable, admit analytic continuation on a finite width strip when r = 1 
and on the whole complex plane if r = 2. 

3.2. Risks bounds for the minimum contrast estimators. We start by presenting some 
general bound for the risk. 

Proposition 3.1. Consider the estimators £r> m = i m and go m = g m of £ and g defined by 
and CT . Let A(m) = D TO 7r _1 fi Dn \f* £ {D m xa)\~ 2 dx. Then, under $K^) and M , 

(3.1) E(||* - L\\l) < \\£ - e m \\ 2 2 + 2E(F 1 2 )A(m)/n + (k c + II £ \\i)D 2 Jk n 
and 

(3.2) E{\\g - g m \\ 2 2 ) < \\g - g m \\ + 2A(m)/n + {k q + l)D 2 Jk n . 

As in deconvolution problems, the variance term A(m)/n depends on the rate of decay of 



the Fourier transform /*, with larger variance for fast decreasing /*. Under JAiD , the variance 
term is bounded in the following way 

(3.3) A(m)<Air(m) where T(m) = D 2 ^ +1 ~ p e^{2(3a p (TiD m ) p ), 
with 

(3.4) Aj = (a 2 vr 2 + l) a /(7r P 4R((3, a, p)) with R(J3, a, p) = l p=0 + 2{3pa p I 0<p < 1 + 2(3a p l p>1 , 
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In order to ensure that Y{m)/n is bounded, we only consider models such that nD m = m <m r , 
in M. n = {!,•••, m n } with 



(3.5) m n < 



7r -i n i/(2a+i) if p = 



ln(n) 2a + l-/o /hi(n)\ 



if p > 0. 



Lastly, the bias terms \\£ — i m \\l and \\g — g m \\% depend, as usual, on the smoothness of the 
functions £ and g. They have the expected order for classical smoothness classes since they 
relate to the distance between g and the classes of entire functions having Fourier transform 
compactly supported on [— irD m , 7iD m ] (see Ibragimov and Hasminskii (1983)). 

Since £ m and g m are the orthogonal projections of £ and g on S m , when £ belongs S ae . r ^B^a e ) 
and g belongs S a ^ rgtBg {.^a g ) defined by jRlt , then 

(3.6) \\£-£ m \\l = (27T)- 1 f \t\\x)dx < [K ae /(2n)](D 2 m n 2 + l)-^ex V {-2B e n r W2}, 

J\x\>TTDm 

and the same holds for || g m — g ||| with (ae, Be, re) replaced by (a g , B g , r g ). 

Corollary 3.1. Under jAi| ), jA 2 | ) and jA 4 | ), ^et T(m) and Ai being defined in \3. 6 J\) and \3.4\j - 

Assume that k n > n, that £ belongs to S at ,r e ,Bt{i^a t ) an d that g belongs to S ag ^ g ^B g {na g ) defined 
by jRi| ). Then 

n¥ - 4.111) < ^r(D 2 m 7t 2 + iy^ e -^D2 + 2A 1 E(F 1 2 )r(m)/n + D^(« r + || ^ ||i)/n, 

and 

E(lb - ^||1) < ^(^ 2 + i)- Se - 2B ^ sD - + 2A 1 r(m)/n + ( Kg + l)^/n. 

Remark 3.1. We point out that the {(f m ,j} are R-supported (and not compactly supported) 
and hence, we obtain estimations of £ and g on the whole line and not only on a compact set 
as for usual projection estimators. This is a great advantage of this basis even if, due to the 
truncation \j\ < k n , it induces the residual terms D 2 n {nc+ \\ £ \\i)/k n and D^Kg + l)/k n , in 
the upper bounds of the risks. The most important thing is that the choice of k n does not 
influence the other terms. Consequently, we can find a relevant choice of k n (k n > n under 
( A 2 D and QA 4 D ), that makes those additional terms unconditionally negligible with respect to 



the bias and variance terms. The condition k n > n allows us to construct truncated spaces Sm ^ 
using 0(n) basis vectors and hence to use a tractable and fast algorithm. The choice of larger 
k n , independent of £ and g, does not change the efficiency of our estimator from a statistical 
point of view but will only change the speed of the algorithm from a practical point of view. 
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fe 


p = 
ordinary smooth 


p > 
super smooth 


9 


r e = 
Sobolev(s) 


TrAn, = 0(nV(2a+2a,+i)) 
rate = 0(n~ 2a '^ 2a+2a ^) 


ttD^ = [ln(n)/(2/3^ + l)]Vp 
rate = 0((ln(n))- 2a ^) 


n > o 
c°° 


= [ln(n)/2B e ] 1/ri 
rate = y ' 


wDfa implicit solution of 

= 0(n) 



Table 1. Best choices of D^ t minimizing E(||£ — ^ m |||) and resulting rates for £^ 



For the case ri > and p > 0, the choice irD^ = [ln(n) / '(2(3a p + leads to a rate which 
is faster than any power of ln(n) and slower than any power of n. For instance if ri = p, the 
rate is of order [\n{n)} b n- Bl ^ Bt+ ^ with b = [-2a e pa p + {2a - r e + l)B i )/[rt{(3a p + B t )\. 

The same table holds for g, by replacing (a^B^rf) by (a g , B g ,r g ). For D^ g chosen in the 
same way as D^ ll in Table 1, the rate of convergence of g^ is the minimax rate of convergence, 
as given in Fan (1991a) for r g = 0, in Butucea (2004) for r g > and p = and in Butucea and 
Tsybakov (2004) for < r g < p and a g = 0. 

The rate of convergence of frhe,m g is given by the following proposition. 

Proposition 3.2. Under |Ai| ), IA2D , |A3p . ([A4D , and IA5D , assume that g belongs to some 
space S ag:rgtBg (K ag ) defined by ifHIP w'tfi a g > 1/2 if r g = 0. Lei / A ,,rn 9 &e defined fry (OOP, wit/i 
andfh g such that andD^ g minimize the risks E(||£— £ m |||) and E(|| g— Pm|||) respectively. 
If a n = n k for k > 0, and k n > n 3 ^ 2 , then, for n great enough and C = KgQ 2 (l + gig$ 2 Koo,g), 

(3.7) n(ke,m g - f)U\t < Co[E(|K - L e + E(\\g - g^ g §)} + o^ 1 ). 

If a g < 1/2 then we only have a result of type \\{f - fm^m^AWl = O p (\\£-£rh e \\l+ \\g-g~m g \\l)- 
Also note that the result holds when the constant «oo,G is replaced by || / \\oo,a if / is bounded 
on the compact set A. 

The performance of frh^mg is given by the worst performance between the one of t^ t and the 
one of grh g . Let us be more precise in some examples. Under the assumptions of Proposition 

E2J 

• If the EiS are ordinary smooth, 



NONPARAMETRIC ESTIMATION IN AN ERROR-IN- VARIABLES MODEL 



9 



- If re = Tg = and irD Al = O ( n V(2^+2a+i)^ and nD ^ = q ^ n i/(ao 9 +2«+i)) j t h en 

E(||(/ - L t ,ih B )U\l) < 0{n- 2a 'l^ +2a+ ^) with a* = M(a e ,a g ). 
-Er t > 0, r g > 0, ttD^ = (ln(n)/25) 1 /^ and ttAt^ = (ln(n) /2Bfl r <> , then 

E(||(/-/^Jl^<o(^^ J with r* = inf for,). 

• If the £j's are super smooth and r t = r g = 0, nD^ = irD^ = [ln(n)/(2/3cx p + l)] 1 ^, 
then 

E(\\(f-f Mg )l A \\l)<0([\n(n)]- 2a '^) with a* = inf (a,, a s ) . 

Since £ = fg, the smoothness properties of £ are related to those of / and of g. 

When t belongs to S ai $ % Bt ( K aJ and g belongs to S ag ,o,Bg(^a g ) with ae = a g , then the resulting 
rate is the minimax rate given in Fan and Truong (1993) for Holderian regression functions and 
densities with the same regularity. It follows that our estimator seems then optimal in that 
case. It is easy to see that the estimator is also optimal if a g > a^, that is when the density g is 
smoother than the regression function /. But the optimality of the rate of ffh t ,fh g when ae > a g , 
that is when the regression function / is smoother than g, remains an open question. This is 
a known drawback of Nadaraya-Watson type estimators for regression functions, constructed 
as ratio of estimators. In "classical" regression models, when the X^s are observed, a lot 
of methods, like local polynomial estimators, mean square estimators..., avoid the need of 
regularity conditions on g for the estimation of /. The point is that standard methods solving 
the regression problem do not seem to work in the errors-in-variables model and it is an open 
problem to build an estimator of / that does not require the estimation of the density g. 

^From the above results we see that the choice of the dimensions and Dm g that realize 
the best trade-off between the squared bias and the variance terms depends on the unknown 
regularity coefficients of the functions I and g. In the next section we provide the upper bounds 
of the risks of the penalized estimators, constructed without such smoothness knowledge. 

3.3. Risks bounds of the minimum penalized contrast estimators: adaptation. 

Theorem 3.1. Under the assumptions jAxD , |A 2 D and jA 4 | ), let 

\ i/p<l/3 

(3.8) p 1 = \ P(any\\ / \a,K ,f3,o-,p)(l + a 2 7c 2 ) a / 2 K 1 (2<K)- 1 /2 if 1/3 < p < 1, 

[ (3{o-Tr) p Xi(a, K Q ,f3,a,p) */p>l- 

and 

(3-9) fi 2 = A i ll{0<p<l/3}U{p>l} + A*l 1 1 /er || 2 1{l/3<p<l} - 
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Let k n > n, £ = £ mi and g = g mg be defined by \2. 6)) and \2. 7j) and with pen^ and pen g given 
by \2.ty) . for k and k' two universal numerical constants and 1 < m < m n , m n satisfying \3. 5]) 
and, if p > 0, 

Vp 

(3.10) m n < tx- 1 



ln(n) 2a + min[(l/2 + p/2), 1] ^ / ln(n) \ 



1) Adaptive estimation of g. (Comte et al. (2005a)). 

Theng satisfies E(\\g-g\\l) < Kini meMng [\\g - g rn \\ 2 2 + D 2 m (K g + l)/n + pen g (m)] +c/n where 
K is a constant and c is a constant depending on f e and A g . 

2) Adaptive estimation of I. Under the assumption \A^ , «/E|^i| 8 < oo then i satisfies 



E(||£ - < K' inf [\\£ - e m \\ 2 2 + D 2 m (K C + || £ HO/71 + E(pen,(m))l + c'/n 



where K' is a constant and d is a constant depending on f £ , kc, and 



i- 



Remark 3.2. In Theorem 13 .11 the penalty is random since it involves the term m2(Y), instead 
of the unknown quantity E(y i 2 ) which appears first. The only price to pay for this substitution is 
the moment condition E|^!| 8 < oo instead of E^l 6 < oo if E(Y" 1 2 ) was in the penalty. Moreover, 
the term E(pen£(m)) in the bound is equal to pen^(m) with m 2 (V) replaced by E(F 1 2 ). 

Remark 3.3. According to Remark |2.2^ the penalty functions are of order T(m)/n if < p < 
1/3, of order D^ 2 1 ^ 2 T(m)/n if 1/3 < p < 1 and of order DP n T(m)/n if p > 1. When p > 1/3, 
the penalty functions pen^(m) and pen^(m) have not exactly the order of the variance T(m)/n, 
but a loss of order £)™ m [( 3p / 2-1 / 2 ) ,p ] occurs, that is of order Dm P ~ 1 " 2 if 1/3 < p < 1 and of order 

D p m if P > 1- 

Remark 3.4. Rates of convergence of g. The rate of convergence of g is the rate of 
convergence of g mg when 0<p<l/3or when p > 1/3 and r g = or r g < p. And there is 
a logarithmic loss, as a price to pay for adaptation when r g > p > 1/3. We refer to Comte et 
al. (2005a) for further comments on the optimality in a minimax sense of g. 

Remark 3.5. Rates of convergence of i. The rates, similar to the rates of g, are easy to 
deduce from Theorem 13 .li as soon as £ = fg belongs to some smoothness class, but the procedure 
can reach the rate of £ me , that uses the unknown smoothness parameter. If pen^(m) has the 
same order as the variance order T(m)/n, then Theorem 13. II guarantees an automatic trade-off 
between the squared bias term \\£ — £m\\2 an d the variance term, up to some multiplicative 
constant. Else, there is some loss due to the adaptation. Let us be more precise. 

If < p < 1/3, the errors £j's are ordinary smooth or super smooth with p < 1/3. If £ 
satisfies jRiD , the squared bias is bounded by applying (j3.6J) which combined with the value of 
pen£(m), of order Y{m)/n (see (I3.3jl ) gives that the estimator g automatically reaches the best 
rate achievable by the estimator £ me , as given in Table 1. 
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If p > 1/3 the penalty function pen^(m) is slightly bigger than the variance order T(m)/n. 
The rate of convergence remains the best rate if the bias \\£ — £ m \\2 * s the dominating term 
in the trade-off between \\£ — £ m \\ 2 and pen^(m). When = and p > 0, the rate of order 
(ln(n)) _2af//p is given by the bias term, and the loss in the penalty function does not change 
the rate of the adaptive estimator £, which remains the best achievable rate E || £ — lfh t \\\- In 
the same way, when < ri < p, the rate is given by the bias term and thus this loss does not 
affect the rate of convergence of £ either. 

Let us now focus our discussion on the case where pen^(m) can be the dominating term in 
the trade-off between \\£ — £ m \\\ and pen £ (m), that is when rg > p > 1/3. In that case, there 
is a loss of order D™ m ^ 3p ^ 2 l ^ 2 ' ,p ^ in the penalty function, compared to the variance term. But 
this happens in cases where the order of the optimal D m is less than (Inn) 1 /'' and consequently 
the loss in the rate is at most of order Inn, when the rate is faster than logarithmic: therefore 
the loss appears only in cases where it can be seen as negligible. 

In particular, there is no price to pay for the adaptation if the £j's are Gaussian and the e^s 
are ordinary smooth. Indeed, in that case, the rate of convergence of the penalized estimator £, 
without any knowledge on i or g, is the same as the rate given by the non penalized estimator 
£rh e , requiring the knowledge of smoothness parameters. But, if both the ^'s and the e^s are 
Gaussian, then p = 2 and a logarithmic negligible loss appears in the rate of I compared to the 
rate of £m r 

Theorem 3.2. Adaptive estimation of f . Under the assumptions |Ai| ), |A 2 D , \A 3 ]) , jA 4 D 
and |A 5 p , let f be defined by \2. with g and £ be defined in \2. 7| ) and \2.di) with rh g e M n , g 
satisfying \3. ,5j) and KS.lty) . D mng < (n/ ln(n)) 1/(2a+2) and me G M n i satisfying KS. b}) and 
iS.l(J\) . Assume that g belongs to some space S ag ^^B g {^a g ) defined by |Ri| ) with a g > 1/2 if 
r g = 0, and that E^l 8 < oo. If k n > n 3//2 , a n = n k for k > 0, for n large enough, C = SKg^ 2 
and Ci = AK'gQ 2 (2gl + 1)^4 G , then 




where K and K' are constants depending on f £ , and c is a constant depending on f £! f and g. 

As in Theorem 12.11 if a g < 1/2 then it may happen that D ihg > n 1 ^ 2a+2 \ and in this case we 
only have a result in probability: ||(/ — /) 1 1 1 = O p {\\£ — ^||!+ \\9 — #11 2)- Moreover, the result 
holds when the constant Koo,g is replaced by || / \\oo,A if / is bounded on the compact set A. 
Also note that the remark I3~TI is still valid for all adaptive estimators. 
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Comments about the resulting rates for estimating /. First the rate of convergence 
of / is given by the worst rate of convergence between the rate of I and g. Obviously all the 
comments about fm e ,m g , related to this fact keep holding here. 

When 0<p<l/3or when < p and r g < p, then / achieves the rate of convergence of 
fm e ,m g , given by the worst rate of convergence between E || — t \\\ and E || g^ g — g ||f. And 
when r g > p > 1/3 or ri > p > 1/3, there is a logarithmic loss in the rate of convergence of / 
compared to the rate of convergence of fm e ,m g - 

Since the regularity of I is by definition the regularity of fg, the rate of convergence of I in fact 
depends on smoothness properties of / and g. As a consequence, if £ and g belong respectively 
to Sat,n,B f {K at ) and S a g ,r g ,B g («o fl ) , then the rate of convergence of / is the rate of f^ t>ihg when 
< p < 1/3. According to Fan and Truong (1993), this rate seems the minimax rate when 
ci£ < a g and ri = r g = 0. In the other cases, the question of the optimality in a minimax sense 
remains open. Even if the regression function is smoother than g and < p < 1/3, the rate of 
convergence of / has the order of the rate of convergence of fm e ,m g , but we do not know if the 
rate of frh e ,m g is the minimax rate (see comments following Theorem I2.1JI . When p > 1/3, a 
loss appears between the rate of convergence of / and the rate of convergence of fm e ,rh g - This 
loss only appears, when Tt > p or r g > p (see the comments after Theorem 13 .ljl . in cases where 
it is negligible with respect to the rate. 

Remark 3.6. Obviously, the resulting rates for all estimators depend on the noise level a. The 
first point is to note that if a = 0, then by convention B = = p = 0,X = l, and Z = X is 
observed. In that case, T(m)/n of order D m /n has the expected order for the variance term 
in "usual regression" , when the explanatory variables are observed, and the same holds for the 
penalties pen £ and pen fl . This order D m /n is the expected penalty order for density estimation 
and nonparametric regression estimation, when there is one model per dimension, as in our 
case. 

The second point is to note that if a is small, then the procedure automatically selects a 
dimension D m closed to the dimension that would be selected in "usual" density estimation 
and nonparametric regression estimation. 

Concluding remarks 

Our estimation procedure provides an adaptive estimator in the sense that its construction 
does not require any prior knowledge on the smoothness parameters of the regression function 
/ and of the density g. This estimation procedure allows to consider various smoothness classes 
for the regression function and for the density g when the errors are either ordinary smooth or 
super smooth, and to give upper bounds for the risk in all the cases. 

The resulting rates of convergence for the estimation of / are given by the worst between 
the rate for the estimation of fg and the rate for the estimation of g. Nevertheless, they are 
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the minimax rates in cases where lower bounds are available. In the other cases, the resulting 
rates are in most cases the best rates achievable if the smoothness parameters were known. 
Some logarithmic loss, negligible compared to the order of the rate, appears, as a price to pay 
for the adaptation, when both the errors density f e and fg are super smooth with f e strictly 
smoother than fg. This logarithmic loss appears when the influence of the noise ae dominates 
the smoothness properties of / and g. 

4. Proofs 

4.1. Proof of Proposition 13.11 By applying Definition (|2.2|) . for any m belonging to Ai n , 
£ m satisfies j n ^(£ m ) — J n ,e(£rn) < 0. Denoting by u n (t) the centered empirical process, 

1 n 

(4.1) v n (t) = -Y,(Yi<(Zi)-(t,£)), 

i=i 

and by using that t i— » u* t is linear we get the following decomposition 

(4.2) ln , t {t) - ln , e {s) = \\t -£\\t- \\s - £\\j - 2u n (t - s) 

and therefore, since by Pythagoras Theorem, \\£ — £^ ^||| = \\£ — £ m \\\ + — £m we infer that 
\\£ ~ CHI < \\£ - £ m \\l + \\£ m ~ &WI + 2v n (i m - Using that a m>j (£) - a mtj (£) = v n (<Pm,j), 
we get 

(4.3) v n {l m - i^) = E {a m ,j{£) - a m> j(£))u n (ip m> j) = E [v n (ip„ hj )} 2 , 

\j\<kn \j\<k n 

and consequently 

(4.4) E\\£ - i m \\l < \\£ - £ m f 2 + ||C - e<$ \\l + 2 E V&x[v n (<p m J)]. 

Now, since the (Yi, Zi)'s are independent, Vax[is n ((p m) j)] = n _1 Var[Yitt* (Zi)], and, arguing as 
in Comte et al. (2005a), by using Parseval's formula we get that 

(4.5) Yl Var M¥W)] < n- 1 || E K m / Woo E(Y?) < E(lf)A(m)/n. 

where A is defined in Proposition Let us study the residual term \\£ m — £)% ^|||, by simply 

writting that 

b1>fc„ J |i|>fen 
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Now by definition 

ja m ,j(£) = J p(D m x - j)£(x)dx 

< D*J? J \x\\ip{D m x - j)\\£{x)\dx + ^/D \\<p(D m x - j)\\l{x)\dx 

< Df r { 2 (^J \^{D m x-j)\ 2 dx^j ' k[\ ^/D~sup\x<p(x)\¥¥- 

Consequently ja mJ < D m \\ip\\ 2 K]/ 2 + ^/D^W^ / ti , and \\£ m - < k(kc + ¥\\l)D 2 Jk n . □ 

4.2. Proof of Proposition EI3 The proof of Proposition ^. 2l being rather similar to the proof 
of Theorem E21 is omitted. We refer to Comte and Taupin (2004) for further details. 

4.3. Proof of Theorem 13. It We only prove the result with ~E(Y 2 ) in the penalty instead 
of rh 2 (Y) and refer to Comte and Taupin (2004) for the complete proof with m^OOi as an 
application of Rosenthal's inequality (see Rosenthal (1970)). 

For the study of £, the main difficulty compared to the study of g comes from the unbounded 
noise By definition, I satisfies that for all m G A4 n /, ln,i(j?) +pen^(m) < 7„ i ^(£m' ) ) +pen £ (m). 
Therefore, by applying (|4.2|) we get that 

(4.6) \\£-£ \\l<\\ £ - i<$ Mi +2v n (£-£$) + pen,(m) - pen,(m). 

Next, we use that if t = ti + t2 with t\ in S m and t 2 in S m >, then t is such that t* has its support 
in [— 7T-D m ax(m,m')) 7r -Dmax(m,m')] an d therefore t belongs to S m * where m* = max(m, m'). Denote 
by B m>m ,(0, 1) the set 

= {teS^ ml) /\\t\\ 2 = l}. 

It follows that 

Wn(i-&\ < ¥~t ] h SUp \u n (t)l 

*eB m , A (o,i) 

where v n (t) is defined by (|4.1|) . Consequently, by using that lab < x~ 1 a 2 + xb 2 

\\£-£\\l < ¥t ] -nl + -¥-^\\l + ^ sup ^(t)+pen,(m)-pen,(m) 

x teB m , A (o,i) 

and therefore, writing that ¥ ~ ^¥2 < i 1 + V^W ~ ^¥2 + (1 + 2/) IK - ^Hl, with y = 
(x + l)/(x — 1) for a; > 1, we infer that 

2 



< ( ^] ¥ - &\l + ~~~T^' sup v 2 (t) + §±±(pen«(m) - pen,(m)). 
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Choose some positive function p^(m, m!) such that xpiim, m!) < pen £ (m) + pen^(m'). Then, 
by denoting by k x = (x + l)/(x — 1), 

¥~A\l < 4¥-&\l + ™*[ sup k| 2 (t)- p(m,m)} + 

teB m , A (o,i) 

(4.7) +n x (xpe(m, m) + pen £ (m) — pen £ (m)) 
that is 

(4.8) \\i-t\\l < - + 2K xV en e (m) + xK x W n {m), 
where 

(4.9) W n (m') = [ sup |z/ n (t)| 2 -p / (m,m')]+. 

*eB m , m /(o,i) 

The main point of the proof lies in studying W n (m'), more precisely in finding pe(m, m!) such 
that 

(4.10) E(W n (m))< ^ E(W„(m'))) < C/n, 

where C is a constant. In this case, combining ()4.8|) and ()4.10|) we infer that, for all m in A4 n /, 

E\\l - < k 2 x \\£ - + 2K xV e^{m) + X K x C/n, 

which can also be written 

(4.11) E\\£-£\\l<C x inf [\\e-e m \\ 2 2 + pen t (m)] +C x C'/n, 

mEM n ,e 

where C x = max(^, 2k x ) suits, when k n > n, and (|3.5|) and (|3.1Uj) hold. It remains thus to 
find pt(m,m') such that (|4.10|) holds. 

The process W n (m') is studied by using the decomposition of u n (t) = Vn,i(t) + v n ,2(t) with 

1 n i n 

(4.12) zvi(t) = -Y j {f{X i )u* t {Z i ) - (M» and v n ^t) = -V^(^)- 

i=l i=l 

It follows that Wn(m') < 2W n , 1 (m') + 2W n ^{rn!) where for i = 1, 2, 

(4.13) W nj i(m') = [ sup \v n ,i(t)\ 2 — Pi(m, m ')]+, and pe(m, m') = 2px(m, m')+2p 2 (m, ml). 

<eB mm ,(o,i) 

• Study of W n> i. 

Since under JA 3 D , / is bounded on the support of g, we apply a standard Talagrand's (1996) 
inequality (see Lemma 14.11 below that can be a fortiori applied to identically distributed vari- 
ables) : 
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Lemma 4.1. Let Ui, . . . ,U n be independent random variables and v n (r) = (1/ri) Y17=A r (^i) ~ 
E(r (£/*))] for r belonging to a countable class 1Z of uniformly bounded measurable functions. 
Then for e > 



(4.14) E 



sup|z/ n (r)| 2 -2(l + 2e)# 2 



< 



6 f v 



KlC nH* i 8Mf KxCMVE nH 



— + 



i^m 2 C 2 (e 



-g ^2 M x 



wzi/i C(e) = a/1 + e — 1, i^i zs a universal constant, and where 



SU P ll r lloo < Mi, E sup [i/, 



\ 1 n 

'n(r)| < if, sup - V Var(r(^)) < u. 
/ rs7* n ^ 



The inequality (|4.14j) is a straightforward consequence of Talagrand's (1996) inequality given 
in Ledoux (1996) (or Birge and Massart (1998)). Therefore 

(4.15) E[ sup \v n>1 {t)\ 2 - 2(1 + 2 £l )H 2 ] + < k x ( V -^e- K ^ + Ml e ~^c ( ^\ 

where K 2 = Kx/y/2 and Hi, v% and Mi are defined by E(sup t6B ^ ,i) Wn,i(t)\ 2 ) < H 2 , 

sup Var(/(Xi)<(Zi)) < Vl , and sup H/^K^OIU < M x . 
*eB m>m ,(o,l) teB m , m /(o,i) 

According to ()3.3j) and f)4.5|) . we propose to take 

(4.16) M 1 = M 1 {m,m') = K 00jG y/X^T^). 

write 



For v u denoting by P jtk , the quantity P jt k(m) = E / 2 (Xi)< m (Z^u* A-Z{) 



*\|2\l/2 



sup Var(/(Xi)<(Zi)) < (V |P i)fe (m*)i 2 ) 



Arguing as in Comte e£ a/. (2005a), let us define A 2 (m, \I/) by 



A 2 (m,tf) = D. 

with 
(4.17) 




f*(D m x)f*(D m y) 



**{D m {x-y)) 



dxdy < \ 2 2 (\\y\\ 2 )T 2 2 (m*) 



T 2 (m*) = £ 2a + min [(i/2-p/2),(i-p)] 



and A 2 (||^|| 2 ) = A 2 (a, k , P, <r, p, \\^h) given by 



(4.18) A 2 (||*|| 2 ): 
Now, write Pj t k as 

Pj,k{m) 



Ai(a, Ko,(3,a, p) 



ft 



if p > 1, 

(27r)- 1 /2 A f(a, ft : ,/3,a,p)(l + ( T 2 7r 2 r/ 2 ||vl/|| 2 if p < 1. 
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that is 



" J J fe( D mU) f*(D m V) 



D, 



e iju+ikv <p*(u)<p*(v) 
f*{D m u)f* £ {D m v) 



< x+y ^ u - v ^ Dm f{x)g{x)f £ {y)dxdy j dudv 



'N/V! r)i)„,)dudr. 



f*(D m u)f*(D m v 
By applying Parseval's formula we get that J2j k l-P?,fc( m )| 2 equals 
(p*(u)(p*(v) 




f*{D m u)f*{D m v 



-[(f 2 9)*fe}*((u-v)D m ) 



dudv = A 2 (m, (/ g) * f £ ). 



Since \\(pg) * f £ \\ 2 < ||/ 2 <?|| 2 ||/ £ || 2 = E 1 / 2 (/ 2 (X 1 ))||/ £ || 2 , and A 2 (||/^|| 2 ||/ e || 2 ) < by using 
the definition of /i 2 given in (j3.8|) . we propose to take 

(4.19) v i = Vi(m, m) = /i 2 r 2 (m*). 

Lastly, we have E[sup tgB ,( ,i) kn,i(0| 2 ] — ^(f 2 (^i))^i^( m *)/ n an d thus we propose to take 

(4.20) Ml = H 2 (m, m') = E(f(X 1 ))X 1 T(m*)/n. 
It follows from (j4"T5|) . (jUTBJ), gZEj and (jOUj) that if 

Pl (rn, to') = 2(1 + 2 ei )H 2 = 2(1 + 2 ei )E(/ 2 (Xi))Air(m*)/n 

then 



(4.21) E(W n>1 (m')) < E 

with 
(4.22) 



sup |^,i(t)| 2 - 2(1 + 2 ei )Mf 

t£B ,(0,1) 



< A 1 (m*) + B 1 (m*) 



A( v /x 2 r 2 (m) / 2 A x r(m 

Ai(m) = K 3 exp -AieiE(/ (Xi)) 



(4.23) and Bi(m) = K- 



K L,G A i r ( m ) 



exp <^ -X 2v /eTC7(ei) 



A* 2 r 2 (m 
VE(/ 2 (X0) 



«oo,G 



Since Vm G -M nj £, T(m) < n and |.M n ^| < n, there exist some constants K± and c such that 



]T B l (m*) < X3||/|| 2 ,GAiexp[-X 4V /E(/ 2 (X 1 )) v ^// too , G ] < c/n. 



Let us now come to the study of Ai(m*). 



18 F. COMTE AND M.-L. TAUPIN 

1) Case < p < 1/3. In that case, p < (1/2 — p/2) + and the choice e\ = 1/2 ensures the 
convergence of J2m'eM ne Ai(m*). Indeed, if we denote by ip = 2a + min[(l/2 — p/2), (1 — p)], 
to = (1/2 — p/2) + , K' = ft 2 Ai/p 2 , then for a, b > 1, we infer that 

max(a, b\^ e 2 P° p * Pm ™( a > b ) p e - K 'Z 2m ™( a < b )"< ' e ^ p ^" aP + tfl> ^faww^-iK't* /2)(a<" '+b w ) 
is bounded by 

(4,24) a ^ e 2f3aP-KPaP e -(K'e/2)a- e -(K'^/2)b- + tfJItoWV e -(K'?/2)V>) _ 

Since the function a > a V , e 2/3o-P7rPa' , e -(ii:'5 2 /2)a" j g Dounc [ ec i on j£+ by a constant, depending 
on a, p and K' only, and since AA; P — (3k w < —(/3/2)k u for any k > 1, it follows that 

2) Case p = 1/3. In that case, p = (1/2 — p/2) + , and uo = p. We choose ei = ei(m,m') 
such that 2P<rfTrPD^, - J ftT , E(/ 2 (X 1 ))e 1 D^ = -2(3aHPD p n , that is, since if' = K^/ p 2 , 
e x = e 1 {m,m') = (4/5aVp 2 )/(K 1 A 1 E(/ 2 (X 1 ))). 

3) Case p > 1/3. In that case, p > (1/2 — p/2) + . Bearing in mind the inequality 1)4.24)1 we 
choose ex = e x {m,m!) such that 2f3a p Ti p D p ri ,-K'¥.(p(X l ))e 1 D^ = -2f3a p Ti p D p m * that is, since 
K> = K.XJP2, e x = e 1 (m,m') = (4/?aVp 2 )/(^iA 1 E(/ 2 (X 1 ))) J D^. 

These choices ensure that Y^m'eM e Ai{m*) is less than C/n. 
• Study of W nt 2- 
Denote by 



(4.25) H|(m, m') = (nT 1 ^ ^)XxT(m*)/n, 

i=l 

with {n- 1 J2ti itW(™)/n = Er=i $ - oj)hr(rn)/n + alX x Y{m)/n bounded by 

n 

in' 1 - ( r|)l {n -i| E . Li(5 2_ CT | ) | >(T 2 /2} Air(m)/ri + 3o-f Air(m)/(2n). 

i=l 

Consequently HI 2 (m, m') < H^i(m, m') + Hf ,2(771, m') where 

n 

H C)1 (m,m') = (n~ 1 ^f?-CT|)I {n _i|£^ i £_^[ > ^ /a} Air(m ,l ')/n and % 2 (m,m') = 3(jf Air(m*)/(2n). 

i=l 

By applying ()4.12j) we infer that E[sup tgB ( ( 01 ) \v n ,2(t)\ 2 — P2( m ,m')] + is bounded by 

n n 

E[2 sup (n- 1 V &«(^) - (t, g))f - 4(1 + 2e 2 )H 2 (m, m')} + + 2||s|| 2 .E[(^ 1 V &) 2 ] 
teB ra>m , (o,i) i=1 i=1 



+ E[4(l + 2e 2 )H 2 (m, m) - p 2 (m, m')\ 



+ • 
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that is 



(4.26) E[ sup \u n>2 (t)\ 2 -p 2 (m,m')} + 
*eB roim ,(o,i) 

n 

< 2E[ sup (n- 1 V^K(^) - (t,g))) 2 - 2(1 + 2e 2 )Ul(m,m>)} + + 2\\g\\lol/n 

teB mim ,(o,i) i=1 

+ 4(1 + 2e 2 )E|% 1 (m, m')| + E[4(l + 2e 2 )% 2 (m, m') - p 2 (m, m')] + . 

Since we only consider dimensions D m such that T(m)/n is bounded by some constant re, we 
get that for some p > 2, E|H^i(m, m')\ is bounded by 

1 n n 



i=l 



According to Rosenthal's inequality (see Rosenthal (1970)), we find that, for cr| := E(|£| p ), cr| 2 



i=l 



Now, the assumption JAiJ implies that a > 1/2, therefore |A4 n | < \pn and consequently, 
by choosing p = 3 this leads to X/m'eA-f E|H^i(m, m')| < C(a^ t Q,a^)/n. The last term of the 
inequality ()4.26|) vanishes as soon as 

(4.27) p 2 {m,m') = 4(1 + 2e 2 )% 2 (m, w!) = 6(1 + 2e 2 )X 1 a 2 T{m*)/n. 

For this choice of p 2 (rn, m'), the inequality ()4.26|) becomes E[sup fgBm A (o,i) Wn,2(t)\ 2 — p 2 (m, m)]+ 
is less than 

n 

2 ^ E[ sup (n- 1 ^^K(Z i )-(t,( ? ))) 2 -2(l + 2e 2 )e|(m,m / )] + 
m'eA4 n ,< te 5 ™,-'^- 1 ) <=1 

+ 2\\g\\ 2 2 al/n + AC{l + 2e 2 )/n. 
Then we apply the following Lemma to reach the same kind of result as (|4.15jl for W n ^. 

Lemma 4.2. Under the assumptions of Theorem \S. 1\ z/"E|£i| 6 < oo ; then for some given 
e 2 > 0: 



(4.28) E 



sup 

*e£_ m , (0,1) 



-^iiu^Zi) - (t,g)\ - 2(1 + 2e 2 )eJ(m,m') 

U i=l / 



r a 2 fi 2 T 2 (m* 



n 



exp -Kie 2 



Air(m*) 
/i 2 r 2 (m*) 



ln 4 (n)\ 1 



n n 
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where p, 2 and r 2 (m) are defined by \H. <S)) and ^4-17\ ) and K\ is a constant depending on the 
moments of £. The constant /i 2 can be replaced by As ( 1 1 1 1 2 ) where A 2 is defined by \4- 

By analogy with (|4.22|) we denote by 
(4.29) 

n V /i 2 r 2 (m*) / n \ /x 2 / 

With p 2 (m,m') given by ()4.27|) . by gathering (J4.15)) and (|4.28jl . we find, for W n ^ defined by 
(EH, 

E(W n , 2 (m)) < if A 2 (m*) + C(l + ln(n) 6 /n) /n + if'/n. 

The sum X^m'eA<„ -^2 (7^*) is bounded in the same way as the sum ^ m / e ^ n Ai(m*) with e 2 — 
ei = 1/2 if < p < 1/3 and ei(m,m') replaced by e 2 = e 2 (m, m') = E(/ 2 (Xi))ei(m, m'), when 
p> 1/3 that is e 2 (m, m') = (4j3o- p ii p p: 2 )/(K 1 \ 1 )D! l ! n * UJ . These choices ensure that Y^m'eM e ^( m *) 
is less than C/n. The result follows by taking as announced in (|4.13J) . Pe{m, m') = 2px(m, m') + 
2p 2 (m,m'), that is pe(m,m') = + 2e 1 {m,m'))E(f 2 (X 1 )) + 3{1 + 2e 2 (m,m'))aj]\ l r{m*)/n, 
and more precisely if0<p<l/3, 

(4.30) p e (m,m) = 24E(Y?)\ 1 T(m*) /n, 
and if p > 1/3, 

(4.31) p t (m, m) = 4[3E(lf) + 32(3a p TT p /2 2 D^ /k 1 X 1 ]X l Y{m*)/n. 

Consequently if < p < 1/3, we take pen £ (m) = ^{Y^)\\Y{m)/n, and if p > 1/3 we take 
pen £ (m) = «[E(Y 1 2 ) + f3o- p n p [i 2 D p ^ UJ //ciAijAiT (m)/n, for some numerical constants k. Note 
that for p = 1/3, p — uo = and the second penalty has the same order as the first one with a 
different multiplicative constant. □ 

4.4. Proof of Lemma 14.21 by using a conditioning argument. We work conditionally to 
the £j's and E^ and denote the conditional expectations and probability for fixed £1, . . . , £ n . 

We apply Lemma l4~T1 with ft(£i, Zj) = £iU*(Zi), conditionally to the £j's to the random 
variables Zi), . . . , (£ n , Z n ) which are independent but non identically distributed since the 
£i's are fixed constants. Let Qj^ = E[w* m .(Z 1 )w* mfc (— Straightforward calculations give 
that for H^(m, m') defined in 1)4.25)1 we have 

n 

E 2 [ sup n- 1 V&«(^) - (t,p))] < M 2 (m,m'). 
*eB m , m /(o,i) j=1 

Again, arguing as in Comte et al. (2005a), ^2 jtk \Qj,k\ 2 < A 2 (m,h) < A 2 (||/i|| 2 )r 2 (m, ||/ e || 2 ) 
with \\h\\ 2 < \\f £ \\ 2 , where A 2 (m,h) is defined by JHHJ), A 2 by (jgHD , T 2 (m) by (j4~T7J) . // 2 by 
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( 13 .8j) . We now write that 



n n 

sup (n^VVar^X^))) < (n^ 1 VC 2 )/i 2 r 2 (m*, ||/ £ || 2 ) 

t6B mim ,(0,l) i=1 



i=l 



and thus we take 



m, m = n 



i=l 



Lastly, since 



sup ll/tlloo < 2 max |^|^A(m*) < 2 max |^| V%T(m* 

teB ,(o,i) 



Ki<n 



Ki<n 



we take Mi ^(m,m') = 2max 1 < i < n \£i\^\iT(m*). By applying Lemma HHJ we get for some 
constants k±, k 2 , K3 



E € [ sup vl^t) - 2(1 + 2e)H 2 -] + < Kl 

*6B m , m /(0,l) 



E ^ 6XP I 

i=l ^ 



/x 2 r 2 (m* 



max £ 2 ) exp < — K 3 y/eC(e 



Ki<n 



maxi 



To relax the conditioning, it suffices to integrate with respect to the law of the £j's the above 
expression. The first term in the bound simply becomes: 

of /i 2 r 2 (m*) exp[-/t 2 eAir(m*)/ (/i 2 r 2 (m*)])/?2 



and has the same order as in the case of bounded variables. The second term is bounded by 



(4.32) 



Air(m* N 



E 



rr 



(max|£i| 2 )exp -k 3V ^C7(V 



maxi<j< n |& 



Since we only consider dimensions D m such that the penalty term is bounded, we have 
r(m)/n < K and the sum of the above terms for m G M. n ,l an d < n is less than 



AiE 



( max 4 2 J exp -K 3 y/eC(t 

\ l<t<n J \ 



v / sr=i c 



max!<i< n |& 



We need to study when such a term is less than c/n for some constant c. We bound max; 
by b on the set {max; < 6} and the exponential by 1 on the set {maxj > b} and by 
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denoting /x e = K 3y /eC(e), this yields 
E 



max £, exp — u e 

KKn 1 



< 6 2 E exp(-/i, 



maxi<i< n £ 

2 



+ E ( max& 2 l {maXl< . <nfe |> 6} 



KKn 



< 6 2 



E 



(exp(-^n<rf/(2&2)) + p/|l£tf- of I > 4/2 

V 72 i=l y 



+ r r E(max |&| r+2 ) 

KKn 



1 " 

n ^ 

i=i 



2 a 2 



+ &~ r E(max |£, 



ir+2^. 



Again by applying Rosenthal's inequality (see Rosenthal (1970)), we get that 
E 



max £ exp —fi e 



ET=i £ 



Ki<n 



max 1<i<n $ 



< 6 2 e - Mev ^ CTe /(v^6) + tf^9M. [ nE (|£ 2 - <t 2 | p ) + (nE(^ 4 )) p/2 ] + nE(|ei| r+2 )r r 

also bounded by 

b 2 e -^^/(V2b) + C "(p)& 2 f7 |f 2p 2V^ 2p [n 1 - p + n~ p/2 ] + n^'+ 2 2 r r . 

Since E^l 6 < oo, we take p = 3, r = 4, 6 = cr^- v /eC(e)/t3- v /n/[2\/2(ln(n) — lnlnn)] and for any 
n > 3, and for C\ and C% some constants depending on the moments of £, we find that 



E 



( max £ 2 ] exp -K 3V /eC(e) 



\ ECVmaxel 

\ i=l 



<^ + ^2 

'n 



ln 4 (n 



n 



Then the sum over A4 n ,e with cardinality less than y/n of the terms in (|4.32|) is bounded by 
C(l + ln(n) 4 /-y/n)/n for some constant C, by using again that Y{m*)/n is bounded. 

4.5. Proof of Theorem 13. 2L Let E n be the event E n = {\\ g — g \\oo,A— 9o/2}- Since g{x) > go 
for any x in A, then, on E n , g(x) > go/2 also for any x in A. It follows that 



(4.33) 



E||(/ - fiUIjjJl < 8gfE\\£ - £g + 8||£|| 2 g^E\\~g - g\ 



2) 



where \\£\\oo,A < 9i^,G- Using that \\f\\oo,A < a n , we obtain 
(4-34) E[||(/-/)1 A 1^|| 2 ] < 2(a 2 + ||/|| 2 )\(A)P{E 



where \(A) = J A dx. It follows that for = rhi(n), rh g = rh g (n), if a n P(£'^) = o(n then 
()3.11|) is proved by applying Theorem 13.11 We now come to the study of P(-E^) by writing that 
¥{E c n ) = P (\\g - fifHoo > g /2) =f(\\g- g%> + g%> - ~g\\oo > 9o/2) . By applying LemmaO 
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Lemma 4.3. Let g belongs to S aa ,v a ,B a {^a a ) defined by JRiD with a g > 1/2. Then for t G S m , 
Plloo < VD^\\t\\ 2 and \\g-g m \\oc < (2n)- 1 V^D^((nD m ) 2 + l)~ a ^ 2 ex P (-B g \nD m \^)A 1 g /2 . 

and by arguing as for || £ m - & ||§, we get that \\g - g^Joo < \\g ~ 9m a \\oo + \\9m a ~ 
also bounded by 



y/ K (Kg + l)D% 2 J^k~ n + (2n)-\nD 1%g ((nD rh f + l)-^ 2 exp(-5 ff |7r£)^ \ V °)A)/ 2 



Consequently, \\g—g^ g ||oo tends to zero as soon as g belongs to some space S agjUgi Bg(^a g ) defined 
by JRiD with a g > 1/2 if r g = and since k n > n 3 ^ 2 and D„ 9 = o(y/n) for a > 1/2. It follows 
that for n large enough, ||# - g^Joo < go/4 and consequently F(E%) < F[\\g^ g - g\\oo > g /4}. 
By applying again Lemma 14.31 since g)^ — g belongs to , we get that 



(4-35) P(^) < P[||$ -~g\\ 2 > go/Hy/D*,)]. 

In this context, we have 

( 4 - 36 ) W^l ~ 9m g \\l = ^ ( a ™ 3 ,j ~ a ™ 9 ,j) 2 = ^2 vlg(¥™ 3 ,j) = SUp V 2 ng {t) 



\j\<k n \j\<k n 

Consequently, 



t&B A (0,1) 



¥(E c n ) < P[ sup \u ni9 (t)\ > ga/iiJD^)} < supP[ sup \v n , g (t)\ > g /(Ay/D m )] 

teB rhg (0,l) V meM n i 6 B Ag (0,l) 

< V P[ sup \u n>g (t)\ >(7o/(4v / A^)]. 

We apply Talagrand's (1996) inequality as given in Birge and Massart (1998), to get that if 
we take A = go/(8y/TJ^) and if we ensure 2H < g /(8y/D^), then P[sup teBm(01) \h> n , g {t)\ > 
( ?0 /(4 v / A^)] < 3exp [-K[n (min[(D m v)~ l , (M.^/TJ^)- 1 ])} . This yields 

(4.37) P(E£) < K {^M-Kn/{M l ^L\ l )]+e W [-K' l n/{D m v)}}. 

Since we only consider D m such that D m < ^/n, 



a n \M n \e^ V [-K[nl{M 1 ^/D^)\ < a n \M n \ exp(-iT"n 1 / 4 ) = o(n~ l ). 

We only consider D m such that T(m)/n tends to zero. Consequently, when p > then D m < 
(\nn/(2/3a p + l)) 1/p which combined with the fact that v < D 2 £ +1 - p exp(2/3cr p n p D p n ) gives that 
a n \M n \ exp (~K[n/(D m v)) = o(l/n). 

When p = 0, then v = piDm +1 ^ 2 and consequently, as D m < (n/ ln^ra)) 1 ^ 2 " 4 " 1 ) < n 1 ^ 2 " 4 " 1 ^, 

exp(-lfyi/(A»v)) < exp(-K"n/(D 2 ^ +3/2 )) < exp(-K"n 1/(4{Q+1)) ). 

Analogously, \fD m H < 1 / -y/ln(n) in the worst case corresponding to p = 0, for D m < 
(n/ln(n)) 1/(2Q+2) , tends to zero and therefore is bounded by go/8 for n great enough. We 
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conclude that if we only consider D m such that D m < n l ^ 2a+2S) then a n F(E^) = o(l/n), and 
the result follows by applying the inequalities (|4.HH|) and (|4.H4j) . □ 

Proof of Lemma 14. 3L For t G S m , written as t(x) = J^j^it^m^^Pmjix) and |t(x)| 2 < 
J2 j &\( t i ( Pm,j)\ 2 J2 j & l(^m,j)*(-^)| 2 /( 27r ) 2 with by applying ParsevaPs Formula 

El^^)| 2 Ei«i)*(- ;r )| 2 /(2^) 2 = \\t\\ 2 2 D m L*{ufdu/{2<K)=D m \\t\\l 
Let b such that 1/2 < b < a g . Since \\g — g m ||oo < (2n)~ 1 J^ >7tD \g*(x)\dx we get that 
\\9-9m\\oo < (27r)- 1 ((nD m ) 2 + lp a ^ 2 e- B ^ D ^ 9 [ \g*(x)\(x 2 + l)^~ b » 2 e*^" 9 dx 

J \x\>irD m 

also bounded by 

^((TrAn) 2 + l)-^- h)/2 eM-B 9 WD m \^)K]! 2 J ! (x 2 + l)- b dx 

2n \jJ\x\>^D m 

< (2n)- 1 ((nD m ) 2 + l)-(«*-»/ 2 expt-fl^J^VAn) 172 " 6 

< (27r)- 1 v / ^((7r J D m ) 2 + l)^' 2 exp(-B g \irD m \ r s)K l J 2 . 

□ 
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